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Here we study the effects of many-body interactions on rate and mechanism in protein folding, 
. . . using the results of molecular dynamics simulations on numerous coarse-grained Cc-model single- 

' domain proteins. After adding three-body interactions explicitly as a perturbation to a Go-like 

\ Hamiltonian with native pair- wise interactions only, we have found 1) a significantly increased 

. correlation with experimental 0-values and folding rates, 2) a stronger correlation of folding rate 

with contact order, matching the experimental range in rates when the fraction of three-body energy 
in the native state is ~ 20%, and 3) a considerably larger amount of 3-body energy present in 
Chymotripsin inhibitor than other proteins studied. 
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Understanding the nature of the interactions that stabilize protein structures and govern protein folding mechanisms 
is a fundamental problem of molecular biology , with applications to structure and function prediction 0, 

as well as rational enzyme design |ic| . Regarding folding mechanisms, protein folding has long been known to 
be a cooperative process, at least for smaller single-domain proteins Experimental scenarios that lack a first- 

order-like folding barrier are rare [l^ . often in contrast to simulation results. There are other discrepancies between 
simulation and experiment. For example, while the experimental folding rates for a typical set of 18 2-state, single 
" ' '■ domain proteins (given in Methods) span ^ 6 orders of magnitude, simulations of coarse-grained models of the same 
On [ proteins have rates that vary by about a factor of 100, a discrepancy of 4 orders of magnitude. 

, How does one then quantify the sources of the barrier that controls the folding rate? The folding barrier is the 
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residual of an incomplete cancellation of large and opposing energetic and entropic contributions, with the relative 



^ . resiauai ot an mcompiete cancellation oi large ana opposing energetic ana entropic 

. smallness of the barrier allowing folding to occur on biological time-scales lla, [fj] . Among the important energetic 

o . . . n 

contributions that drive folding are solvent-mediated hydrophobic forces L15| , which are known to be weaker on 



short length scales, or low concentrations of apolar side-chains 16]- a scenario likely to be present when the protein is 
unfolded. Hence the solvent-averaged potential governing folding almost certainly contains a non-additive, many- body 
component. The folding free energy barrier increases as the non-additivity of interactions is increased |l7lll8Lll9| . due 
to the decreased energetic correlation between the native conformation and conformations that may be geometrically 
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M , similar to it. 



Experimental 0-values give a measure of the strength of native interactions involving a particular amino acid 
(residue) in the transition state [2^, thus quantifying a residue's importance in folding. However the 0- values obtained 
from simulations of coarse-grained protein models generally do not correlate well with the experimentally determined 
values. Model proteins are coarse- grai ned on the belief that a reduced number of degrees of freedom can capture the 
essentials of the folding process |j, |2l|, |22| , however the less than ideal agreement with experimentally observed rates 
and mechanisms leads one to consider alternate forms for the coarse-grained Hamiltonian or energy function, as well as 
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to consider more detailed all-atom models [23|, |2J, |25j which may contain explicit solvent as well [6|, |25|, \26L 1271 l28l |2£ 

But it is also clear that coarse-grained simulations allow a study of microscopic dynamics that would not be possible 
by all-atom models with present-day computing power. Because we cannot yet fully analyze the statistics of fol ding 
trajectories in all-atom models, coarse-grained simulational models such as off-lattice Cq, models 0, |3 Isol Isil [sj 
have been essential in elucidating protein-folding mechanisms. 

We could then take the following approach: postulate a given feature thought to be present in the system and ask 
to what extent this feature, such as many-body potentials, must be present in the Hamiltonian of a coarse-grained 
model for best agreement with existing experimental data on protein folding rates and mechanisms. 



I. MATERIALS AND METHODS 
A. Simulation Model 



Eighteen two-state folding proteins with known native structures (PDB codes lAEY, lAPS, IFKB, IHRC, IMJC, 
INYF, ISRL, lUBQ, lYCC, 2AIT, 2CI2, IPTL, 2U1A, 1AB7, ICSP, ILMB, INMG, ISHG ) were selected for 
coarse-grained simulations. For all proteins except the last 5 above, rate data was available at various denaturant 
concentrations. These were then used for further analysis at the stability of the transition midpoint. 

The simulated proteins consist of a chain of connected beads, with each bead representing the position of the Cn atom 
in the corresponding amino acid. The off-lattice Cq Go model has been described in detail previously |22ll30ll34l l35| . 
The Hamiltonian has local and non-local parts: Bond, angle and dihedral angle potentials constitute local interactions. 
In the putative Go model, pair contacts between residues in spatial proximity in the native structure constitute non- 
local interactions. Non-native interactions are treated by a sterically repulsive pair-potential only. Heavy atoms 
within a cut-off distance of Vc = 4.8 A in the native structure obtained from the PDB file are associated with a 
Lennard-Jones-like 10-12 potential of depth 62 = ~k^T and a position of the minimum equal to the distance of the 
Cq atoms in the native structure. Let there be N2 pair-contacts of energy £2 in the native PDB structure. Then in an 
arbitrary conformation there are QN2 contacts with energy i?2 ~ £2 Q N2, with Q the fraction of native pair contacts 
(we account for the continuum nature of the Lennard- Jones potentials). 

We let triples with heavy atoms within a cutoff distance of 4.8 A in the native structure have an energy 63. For a 
given protein there will then be 3-body contacts present in the PDB native structure, with total 3-body energy 
e^N^. An arbitrary structure then has a 3-body contribution to the energy of E3 = £3 Q3 7V3, where Q3 is the fraction 
of native triples present in that conformation. Three-body interactions are again Go-like; the remaining bond, angle, 
dihedral, and non-native interaction energies are all unchanged. 

When both pair- wise and 3-body interactions are present, the native non-local part of the energy becomes: 

E^^{a)^{l-a)E2+aE3. (1) 

The free parameter a (0 < a < 1) controls the relative contribution of two- and three-body interactions. The energy 
per triple is assigned as £3 — £2 N2/N^, to preserve overall native stability. 

Dense sampling is obtained from long simulations with a purely 2-body Go Hamiltonian at the transition mid- 
point (e.g. for CI2 the simulation time corresponds to about 3 seconds, as determined from the number of folding 
and unfolding events). From histograms of the number of states at a given fraction of native contacts Q, the free 
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energy F{Q) can be constructed. All simulated free energy profiles displayed a single dominant barrier. All proteins 
are considered at their transition mid-points only, where the unfolded and folded free energies are equal: F„ = Fp 
(figure □ A). 

Three-body energies are treated as a perturbation on the Hamiltonian. The new free energy is given by the exact 
expression: 

where the sum is on all sampled conformations i, A{Q^''\Q) is a delta function that selects only those states where 
g« = Q, and AE{a) = E^^{a) - E^. 

B. Calculated r/)- values 



Simulated kinetic (^-values are given by 



{ni)F - {ni)u' 

where (n,) is the thermal mean value of number of contacts for residue i, and the 7^, U and F subscripts refer to the 
transition state, unfolded state and folded state ensembles respectively. 

We first compare simulated and experimental values using the thermal transition state ensemble (TTSE) around 
the free energy barrier peak, i.e. \F — F^ \ / AF^ < 0.2 was used to define a width AQ of the barrier peak (shaded 
in figure^A). Conformations within this range were taken to be the TTSE, and were used to calculate <j> values from 
equation 13 The validity of the TTSE was checked for CI2 and SH3 with a comparison of 0- values using the kinetic 
transition state ensemble (KTSE), selected as having a folding probability Pfold of roughly 1/2 js^l- Conformations 
in the TTSE were used as initial conditions for 100 simulations which were terminated when the protein folded or 
unfolded. Those conformations that had Si j>FOLD within 0.5 dz 100 were taken as tlie KXSE. For CI2 (SH3) we 
found 315 (283) KTSE configurations from a total of 2359 (2078) TTSE configurations. 

Other reaction coordinates were helpful in determining the kinetic transition state ensemble by constructing multi- 
dimensional reaction surfaces. To this end we found a contact-order weighted variant of Q to be useful, which for any 
configuration v is given by: 



E,<,-|»-j|A--A f 



O'^ = ^»<J ' (A) 



where the sum is over all Ca atoms, and A^j and A^ are unity if residues i and j are in contact in conformations ly 
and the native structure respectively, otherwise they are zero. 

We determined 0- values in the presence of three-body interactions analogously to eq. Q . Under some simplifying 
assumptions (e.g. requiring that the 0- value that is independent of the perturbation energies): 

_ (1 - a) ((n,) J) - (n,)[r^) N, + a {{m,)f - {m,)^^^)N^ 



(1 - a) {{n^p - {n,p) N, + a {{m,)^f - {m,)\f)N: 
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Here mi is the number of three body interactions in which monomer i is involved, and superscript (a) indicates 
averaging the ensembles (7^, C/, F) in the presence of 3-body energy. When a — > 0, l(SJl reduces to Q. 
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C. Miyazawa-Jernigan-based Models 

The effect of heterogenity in the model was also studied by interpolating between the Go model and the Miyazawa- 
Jernigan (MJ) models by varying the free parameter a between zero (Homogeneous Go model) and unity (MJ model). 
The contact energy for any pair of residues (not necessarily native) is then: 

= (1 - a)e2 + ae^J , (6) 



where £2 is as above, and e^*' was proportional to the MJ interaction energy |38| between the residue types of i and j, 
scaled by a factor to ensure the energy of the native structure is a-independent. An interpolation between a uniform 
Go model and a heterogeneous Go model with native contact energies given by MJ parameters was also considered. 

D. Contact Order and Statistical Significance 

Absolute contact order is the average sequence separation between residues having native contacts js^: aCO — 
J2i>j N ^ where M is the total number of native contacts. Relative contact order is scaled again by chain 
length N: rCO = aCO/N. 

Statistical significance or P-value is the probability to achieve a given correlation coefficient, r, assuming random 
data: P = erf(|r| N/2). Small data sets almost always have fairly large P, even if r is large. Large data sets may 
still have small P even if the correlation is weak, which would still indicate a systematic effect. 

II. RESULTS 
A. Protein folding rates 

Here we considered the effect of introducing a three-body potential to an off-lattice two-body Go model studied 
previously [3^ Issi Lof . Eighteen mentioned single-domain proteins that are known to fold by a two-state mechanism 
were selected, and coarse-grained so that each amino acid corresponds to a bead at the position of the Ca atom. Long 
simulations at the folding temperature Tf for a subset of the proteins showed a single exponential distribution of first 
passage times: P{t) ^ exp{—Kt). For these proteins the simulated log folding rate, log(K), correlated very strongly 
(r=0.997) with the free energy barrier height AF^ , indicating that AF^ was an accurate predictor of the rate for 
the simulated Go models. We subsequently assume this proportionality between AF^ and — log(K) for all simulated 
proteins, referring to exp{— AF^ / hgT) as the "effective rate". 

The above mentioned discrepancy between the effective protein rates for our data set and the experimentally 
determined rates for the same proteins motivates an investigation of the effect of many-body interactions on rates. 
When a portion of the total energy is attributable to many-body interactions, energetic gain is not achieved until 
a larger amount of native structure is present, with a correspondingly larger entropic cost. Several polymer loops 
must be simultaneously closed during folding to receive energetic gain. This effect enhances the dependence of rate 
on contact order, increasing the range over which rates vary. 

By attributing a fraction a of the native energy to triples in the native structure, we studied the effects of three- 
body interactions by varying this single parameter (see Methods). The effects on the free energetic potential surface 
for several proteins are shown in figure ^ 
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As the fraction of 3-body energy is increased, the correlation of the simulated effective rates with both absolute 
and relative contact order increases (figure |3 a, b). This effect has also been seen in lattice protein models 0,^^. 
We can also quantify how much 3-body energy, at the residue level, reproduces the experimental dispersion in rates 
for single-domain proteins. The simulated effective rates span 6 orders of magnitude when approximately 20% of the 
energy in the native state of the coarse-grained protein is due to 3-body interactions. 

Rates simulated with a 2-body Hamiltonian do not correlate significantly with experimentally determined rates at 
25°C (figure[21C). We can remove the effects due to variations in stability and reflect the conditions in the simulations 
by taking instead the rate data at the various transition midpoints (after the addition of GdHCl). We then found 
the correlation significantly increased to r = 0.64, p = O.Of 8. Adding 3 body energy in the simulations increases the 
correlation with the experimental rates (at the transition midpoints) still further, with the best correlation achieved 
when a = 10% (see figure |5Ji). 

These results strongly suggest that 1) stability is an important determinant of folding rate, 2) many-body energy is 
present in the energy functions of real proteins, and 3) Go or Go-like models (which ignore non-native interactions) can 
predict experimental rates, illustrating the minor importance of non-native interactions in governing folding barriers. 

The correlation of log rates with rCO also improves as a is increased from zero, however the correlations are modest, 
increasing from (r = 0.29, P = 0.24) at a = to a best correlation of (r = -0.44, P = 0.08) at a = 10% (data not 
shown) . 

B. Testing pair interaction matrices 

The correlation between experimental and simulational 0- values for a 2-body Hamiltonian (rg, Pq) was typically 
not statistically significant (see table I), with the exception of SH3. Rank ordered measures of correlation such as 
Kendall's tau, which are insensitive to the precise values of the data, generally do not improve the agreement (table II). 
We also checked whether simulations with a 2-body Hamiltonian could accurately predict residues that had higher-c/) 
values. This was done by weighting the statistical averaging in the correlation coefficient by the experimental value 
itself as a Jacobian factor. Implementing this recipe did not substantially increase the correlation coefficient, and in 
fact decreased it in the cases of AcP and CI2 (table I). Similar results were obtained by implementing a simple cut-off 
imposing a lower bound for relevant experimental (/>- values (data not shown). 

The experimental data can be used to test energy functions characterizing pair-interactions at the amino acid 
level, such as the Miyazawa-Jernigan (MJ) matrix jSgl- We investigated whether MJ interaction parameters improved 
the simulational predictions of ^-values, by interpolating between a homogeneous Go model and a model with pair 
interactions (between all residues) governed by MJ parameters (see equation ©). We also interpolated between a 
homogeneous Go model and a heterogeneous Go model with native interaction parameters determined from the MJ 
matrix. 

Results are shown for two proteins in figure|31 For CI2 and SH3, no improvement in the correlation with experimental 
data was seen by implementing this procedure. Table I shows the results for the comparison between experimental 0- 
value data and values obtained from a pairwise MJ Hamiltonian. In general if correlations increased by interpolating 
toward MJ parameters they did so only modestly- only in the case of protein L did the improvement reach statistical 
significance {P — 1%, see table I). 

To check of the validity of the recipe of interpolating toward MJ parameters, we compared the largest improvement 
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in correlation (r^* — fo) with the value a* of three body energy required to achieve that correlation. This tests whether 
the poorness of the original correlation was due to the absence of MJ coupling energies. We found that {va* — ro) 
itself correlated well with a* , however the statistical significance was not particularly strong, and the slope measuring 
the degree of improvement was not particularly high (see figure 0)1 . 

C. Testing three-body interactions 

The experimental data can also be used as a benchmark to test what amount of 3-body energy in the Hamiltonian 
of the coarse-grained model gives best agreement with experimental 0- values . We examined this question for the 5 
proteins in table I, by measuring the correlation between the experimentally obtained ^-values, and ^-values of the 
same residues determined from simulations, with conditions ranging from between a pair-wise interacting Go model 
protein, and one governed exclusively by 3-body interactions at the residue level (see methods) . 

As the strength of 3-body interactions increased from zero, the correlation coefficient also increased, for all proteins 
studied (see fig. Inland table I). An exceptional case was SH3, which showed only a modest increase in correlation for 
the kinetically determined transition state ensemble, and no increase for the thermal transition state ensemble. The 
fraction a* of native 3-body energy that gave best agreement with experimental data varied from protein to protein, 
but correlated strongly with the increase in agreement with experimental data (see table I). That is, the improvement 
in correlation (r^. — Tq) itself correlated very strongly with a* (r — 0.97, P — 0.005), further supporting the notion 
that the poorness of the original agreement was due at least in part to the absence of many-body forces. 

For a protein such as C12 with large fraction of 3-body energy, the transition states in the presence of 3-body 
interactions is significantly different than the 2-body transition state. For C12, the root mean square distance (RMSD) 
between all 3f5 structures in the kinetic transition state ensemble (KTSE) was found for both the 2-body and 
2-f 3-body (at a*) cases. Shown in figure B is the "most representative" transition state structure for the 2- 
body and 2-|- 3-body cases respectively, defined as having the minimal Boltzmann-weighted RMSD (minimum over 
structure i of J^j Pi(R-MSD)y ) to all others in the KTSE. The 2-body case shows more overall secondary structure, 
in particular more a-helix, but less /3-sheet. The Q, Qco (see methods), and R (RMSD from the native structure) 
values for the structures in figure |3V,B are Q<*' = 0.54, g'^^ = 0.49, Q^co = 0.41, = 0.29, and i?<*' = 5.5 A, 
i?'^' = 11 A. This indicates that the 2-|-3-body transition state is less structured than the pure 2-body transition state. 
However, kinetically they are about the same distance from the native structure, with Pfold values ptoVo = 0.55, 
Pfold = 0.53 Q|. They have a RMSD of 7.8 A between them, so they are structurally distinct from each other. 
The average RMSD values from the native for the top 4 transition state structures for the 2-body and (2-|-3)-body 
cases are r'^' = 6.3 A, and 7?'^^^' = 8.5 A, again confirming less native structure in the more accurate transition 
state containing 3-body interactions. Interestingly, the high-0 residue 34 has more local secondary structure in the 
pure 2-body case than at a*. It also has no triples in the native state. Its high 0- value in the presence of 3-body 
interactions is the result of correlations with other triples made in the transition state. 

The procedure of adding 3-body interactions was repeated considering only residues in the hydrophobic core of 
native structure, in this case buried with less than w 30% accessible surface area using the Swiss PDB algorithm. 
I http: / / www.expasy.org/ spdbvl) . We saw qualitatively the same effect, but the change in correlation coefficient was 
less pronounced, increasing to about 0.42 for CI2 for example. This implies that coarse-grained model proteins with 
effective solvent-averaged interactions have many-body interactions involving residues on the surface as well. 
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III. DISCUSSION 

The above results suggest that many-body interactions can play a significant role in governing the folding mecha- 
nisms of 2-state proteins when described at the residue level. This seems quite evident upon comparing the statistical 
significance columns in table I or table II for the pure 2-Body Hamiltonian and the 2+3-body Hamiltonian at a* . In 
essentially all cases, many-body interactions helped to establish consistency with protein folding experiments. Some 
proteins showed dramatic improvement, others mild improvement, so proteins may be additionally classified through 
this effect. The value of a* may be used as an indication of the importance of many-body interactions in governing 
the folding mechanism for a given protein, as for example the proteins are ranked in tables I and II. 

Experimental rates vary by about 4 orders of magnitude more than rates obtained from coarse-grained models using 
2-body Hamiltonians. However a modest 3-body component to native stability (about 20% on average) was sufficient 
to reproduce the experimental variability in folding rates. It is an open question as to how large the many-body 
component might be in finer-scale and all-atom models of proteins. Ab initio studies of interaction energies and 
reconfiguration barriers in water clusters suggest they can be quite significant Q . 

For FKBP, protein L, and CI2 the correlation between experimental and simulational ((> values goes from insignificant 
to significant as 3-body interactions are added. In the case of CI2, the agreement between simulations with a 2-body 
energy function and experimental data was the poorest of the proteins studied, the fraction of 3-body energy at best 
agreement was the largest, and the improvement in correlation coefficient the most dramatic. In the case of SH3 on 
the other hand, the folding mechanism appears to be governed more by topology than by energetic considerations. In 
some sense this is an exception that proves the rule, since previous evidence supported a folding mechanism dominated 
by topological considerations [43|, |50| . 

Interestingly, muscle acylphosphatase had the poorest improvement in mechanism prediction by adding 3-body 
interactions, as measured by the correlation coefficient. Its original (/)-correlation for a 2-body Go model was the 
second poorest after CI2. It also required the largest amount of Miyazawa-Jernigan interactions for best agreement 
with experimental 0-values , but still correlated poorly even at best agreement. Intriguingly it is also the slowest 
known 2-state folder at present, yet a good 2-state folder with no intermediates The slow folding is likely due to 
large contact order however, and it would be interesting in the future to apply the 3-body recipe to a topologically 
similar but faster folding protein such as human procarboxypeptidase A2. On the other hand, the improvement for 
AcP as measured by Kendall's tau does in fact become statistically significant, and suggests a large 3-body component. 
We are inclined to take this more robust measure of statistical significance more seriously. The discrepancy of r and 
T indicates some large outliers in (/)-values , likely due to variations in native stabilizing interactions, which may exist 
for functional reasons. These fiuctuations in native interaction strength arc not captured by the uniform Go model 
and 2 -f 3-body models. 

The largest improvement in correlation (r^* — Tq) with the value of interpolation parameter a* required to achieve 
that correlation was used as a measure to test the validity of the 3-body and Miyazawa-Jernigan interpolation recipes. 
The results for the 3-body interpolation recipe showed a strong statistically significant correlation with large slope 
indicating large rate of improvement. The results for the heterogeneous MJ Go model also showed improvement, 
however with smaller slope and smaller statistical significance. It is noteworthy that for the case where CI2, where 
the 3-body recipe does the best, the MJ recipe failed to improve the agreement with experiment. 
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For CI2, the transition state in the presence of 3-body interactions shows less overall native structure than the purely 
2-body transition state, in spite of the better agreement with experimental </)-values for the 3-body case. However it 
is not clear that this will be a general rule. In both cases the transition state consists largely of a disordered form of 
the native topology, sufBciently disordered to be kinetically balanced between the folded and unfolded states. 

The low levels of agreement between experiment and simulation for 2-body Hamiltonians told a somewhat cau- 
tionary tale. While a large body of evidence leaves little doubt as to the importance of native topology in governing 
folding mechanism, these results should serve to show that realistic aspects of the energy function, such a many-body 
component to native stability, should not be ignored. 
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TABLE I: Two-body and Three-body characterization of proteins studied 

Go MODEL'' MJ MODEL MJ-Go MODEL 3-BODY MODEL 



Proteins (PDB) " 


ro Po 


(y, T^a* Pol* 


Q* Tq* Pa' 


a* ra* Pa' 


SH3 (ISRL) 
FKBP (IFKB) 
AcP (lAPS) 
Protein L (2PTL) 
CI2 (2CI2) 


0.58 " 0.0003 
0.32 0.17 
0.12 0.58 
0.18 0.25 
-0.10 " 0.56^ 


0% 0.59 0.0003 
10% 0.41 0.07 
50% 0.35 0.1 
20% 0.38 0.01 

0% -0.017 0.92 


5% 0.59 0.0002 
20% 0.38 0.1 
30% 0.30 0.16 
30% 0.38 0.01 

0% -0.017 0.92 


5% 0.60 " 0.0001 
10% 0.43 0.057 
15% 0.32 0.14 
15% 0.53 0.00027 
35% 0.57 " 0.0004 


Continue 


iV » N2'' N'i' 


3-BODY MODEL 


HIGH-<^ WEIGHTING 

ro" Po" 


SH3 (ISRL) 
FKBP (IFKB) 
AcP (lAPS) 
Protein L (2PTL) 
CI2 (2CI2)) 


56 128 32 35 
107 299 111 20 
98 257 97 23 
62 126 30 41 
65 148 54 35 


5% 3.8 ± 0.2 1.4 2.6%" 
10% 10 ± 0.8 1.5 5.5% 
15% 14 ± 2.0 2.2 8.9% 
15% 6.2 ± 0.5 2.8 3.3% 
35% 17 ± 3.5 3.4 13%" 


0.65" 2.7 X lO"'' 

0.37 0.10 
-0.02 0.91 

0.26 0.10 
-0.43" 0.01 



"Sources for experimental </>- value data: src-SH3 domain 53], FKBP |3, AcP CI2 protein L IZtI . 
'Correlation coefficient and statistical significance between experiments and simulations of a pair-wise interacting G5 model. 
'^a* is in general the value of the interpolation parameter that gives best agreement with experimental data for corresponding model. 
For the MJ models eq. jgj is used, for the 3-body models eq. Q is used. 
"^ra' and Pa' are the correlation coefficient and statistical significance respectively, at best agreement for the corresponding model. 
'^Kinetic transition state (KTSE) has been used. 

^We allow for the possibility of anti-cooperativity in proteins, and hence ascribe statistical significance to negative correlations. Thus 
P-values here are the 2-sided statistical significance. 
'Chain length. 

''Number of native pair contacts. 
'Number of native triples. 

^Number of 0-value data points used in the comparison. 
^Barrier height in ksT at a* . 

'Ratio of the free energy barriers when a =a* and a = 0. 
""Fraction of 3-body energy in the transition state ensemble at a* . 

"Correlation coefficient and statistical significance including a Jacobian factor weighting each term in the correlation function by the 
experimental t/i-value itself, i.e. averages are calculated as {A) = (JZ" </'™''^i)/(E 1 4^^^) where n is the number of data points. This is a 
recipe simply to stress the importance of the agreement between large ^-values . 



TABLE II; Kendall's r and Statistical significance between experiment and simulation 



Proteins (PDB) 


To 


Go MODEL" 

Po 


a* 


3-BODY MODEL' 

Ta' 


Pa' 


SH3 (ISRL) 


0.42 " 


0.00044 


0% 


0.42 " 


0.00044 


FKBP (IFKB) 


0.27 


0.10 


10% 


0.31 


0.055 


Protein L (2PTL) 


0.14 


0.19 


20% 


0.36 


0.00069 


AcP (lAPS) 


0.14 


0.37 


25% 


0.33 


0.027 


CI2 (2CI2) 


0.042 " 


0.72 


35% 


0.40 " 


0.0008 



"Kendall's tau measure of ranked correlation and statistical significance (P([t'| > |r|)) of tau value, between experiments and simulations 
of a pair-wise interacting Go model. 

'o* is the value of the interpolation parameter that gives best agreement with experimental data for a 2-f 3-body Hamiltonian as in 
eq. 0. Ta' and Pa* are Kendall's t and statistical significance respectively, at best agreement for the 2-|-3-body model. 

'^Kinetic transition state (KTSE) has been used. 
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FIGURE CAPTIONS 

Figure: ^ 

The folding barrier height l^F^ increases with increasing three-body contribution to the energy a. Inset (A) shows 
the free energy vs. the fraction of native contacts Q for CI2, for 3 values of a. Main panel shows the barrier vs. a 
for 4 proteins selected from table I. Inset (B): the average slope of /S.F^ vs. a correlates strongly with the number 
of 3-body interactions in the native state (r = 0.89, p — 10^^). Therefore the barriers in the main panel increase at 
different rates due to differing numbers of triples formed in the transition states of the various proteins- more native 
triples typically means a larger 3-body contribution to the barrier. The shaded region in inset (A) corresponds to the 
thermal transition state ensemble described in the methods section. In general this ensemble depends on a. 

Figure: El 

Comparison of simulated and experimental rates. (A): Simulated folding barriers (effectively measuring log folding 
rates for 18 proteins listed in methods) for a pair-wise interacting Go model correlate well with absolute contact order 
[aCO) jsj]. (B): Simulated folding barriers show an increased correlation with aCO, when the fraction of native 
three-body energy is such that the dispersion in effective simulated rates matches the experimental dispersion for this 
data set [a — 20%). Rates now span 5.7 decades, in contrast to 2 decades for a pure 2-Body Hamiltonian (dashed 
line in (B) is the best fit line in (A)). (C): For 13 of the 18 proteins (see methods for a list), rate data was available 
for various different denaturant concentrations. These proteins were used for the analysis in figures C and D. Panel 
(C) shows that for these proteins, the simulated effective log rates do not correlate significantly with the experimental 
rate data at 25° C (D): Tuning the rate data to the transition midpoints and introducing 3-body energy in the 
native state, we saw a significant increase in the correlation between experimental and simulated rate data, with best 
correlation when a = 10%. 

Figure: |31 

Comparison of the agreement of (/)-values between simulation and experiment for (A) CI2, and (B) src SH3. Green 
curves in A and B show the correlation coefficient and statistical significance (insets) for i/i-values derived from the 
thermal transition state (TTSE) in the simulations, as the Hamiltonian was continuously changed from a uniform Go 
model to one with pair interactions governed by Miyazawa Jernigan parameters (the curve shown in inset A is the 
statistical significance of the anf j-correlation in the main panel) - see equation l^. No improvement was seen for CI2 
or SH3 by implementing this recipe. Red and Blue curves show the correlation coefficient and statistical significance 
between experimental and simulated values as a function of the fraction a of three-body energy in the native state. 
Blue curves correspond to TTSE, Red curves- kinetic transition state ensemble (KTSE). For CI2 the improvement as 
a is increased is dramatic, with best agreement with experiment around 35% 3-body energy. On the other hand, SH3 
was exceptional in that it showed the opposite trend, with best agreement for a purely pair-wise interacting model for 
the TTSE and a — 5% for the KTSE. All other proteins studied were bracketed by these two extremes- they showed 
moderate components of 3-body energy, with moderate to large increases in correlation coefficient (table I) . 
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Figure: |31 

Plot of the largest improvement in correlation (ro,* — Tq) vs. the value of interpolation parameter a* required to achieve 
that correlation. Energy functions are interpolated toward a 3-body Go model (eq. and 2-body models with 
Miyazawa-Jernigan energetic parameters (eq. The slope and correlation indicate the validity of the interpolation 
procedure. Adding 3-body energies gives a slope of 2.2, and (r, P) = (0.97, 0.005). Adding a MJ component to the pair 
interaction energies gives a slope of 0.29 but a fit that is not statistically significant: (r, P) — (0.83, 0.38). Restricting 
the MJ component to native interaction energies gives a statistically significant fit, (r, P) = (0.956,0.044), but with 
a shallow slope (0.78) indicating only moderate improvement. 

Figure: \S\ 

The "most representative" transition state structure for the 2-body (A) and 2-1- 3-body (B) cases of CI2, defined 
as the structure having minimal Boltzmann- weighted RMSD to all other structures in the KTSE (see text), (left 
column: representation showing secondary structure, right columns: stereographic views superimposed on the native 
structure (structures generated with molmol)). The 2-body case shows more overall secondary structure, in particular 
more a-helix, but less /9-sheet. (C): (jj-vahie vs. residue index for CI2, for experiment (Blue), simulated pair-wise Go 
model (light blue background), and 2-1- 3-body Go model (Red). The average 0- values for the various energy functions 
are 0*'^'''''' = 0.25, 0*"' = 0.40, 0'^^^' — 0.33, again confirming the more accurate 2-1- 3-body transition state is less 
structured. It is worth noting that native state is more stable in the experiments than in the simulations- the native 
stability is fixed at the transition midpoint in the simulations, regardless of the value of a. 
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